Loss of precision - int -> float or double?

In Java Integer uses 32 bits to represent its value.

In Java Integer uses 32 bits to represent its value. In Java a FLOAT uses a 23 bit mantissa, so integers greater than 2^23 will have there least significant bits truncated. For example 33554435 (or 0x200003) will be truncated to around 33554432 +/- 4 In Java a DOUBLE uses a 52 bit mantissa, so will be able to represent a 32bit integer without lost of data.

See also "Floating Point" on wikipedia.

2 The mantissa is actually 24 and 53 bits for float and double, respectively. It's just that the highest bit is not stored in the representation, because it's not needed (it is always 1). – slacker May 8 '10 at 22:13.

No, float and double are fixed-length too - they just use their bits differently. Read more about how exactly they work in the Floating-Poing Guide . Basically, you cannot lose precision when assigning an int to a double, because double has 52 bits of precision, which is enough to hold all int values.

But float only has 23 bits of precision, so it cannot exactly represent all int values that are larger than about 2^23.

Here's what JLS has to say about the matter (in a non-technical discussion). JLS 5.1.2 Widening primitive conversion The following 19 specific conversions on primitive types are called the widening primitive conversions: int to long, float, or double (rest omitted) Conversion of an int or a long value to float, or of a long value to double, may result in loss of precision -- that is, the result may lose some of the least significant bits of the value. In this case, the resulting floating-point value will be a correctly rounded version of the integer value, using IEEE 754 round-to-nearest mode.

Despite the fact that loss of precision may occur, widening conversions among primitive types never result in a run-time exception. Here is an example of a widening conversion that loses precision: class Test { public static void main(String args) { int big = 1234567890; float approx = big; System.out. Println(big - (int)approx); } } which prints: -46 thus indicating that information was lost during the conversion from type int to type float because values of type float are not precise to nine significant digits.

It's not necessary to know the internal layout of floating-point numbers. All you need is the pigeonhole principle and the knowledge that int and float are the same size. Int is a 32-bit type, for which every bit pattern represents a distinct integer, so there are 2^32 int values.

Float is a 32-bit type, so it has at most 2^32 distinct values. Some floats represent non-integers, so there are fewer than 2^32 float values that represent integers. Therefore, there exists a pair of ints that convert to the same float.

Similar reasoning can be used with long and double.

There are two reasons that assigning an int to a double or a float might lose precision: There are certain numbers that just can't be represented as a double/float, so they end up approximated Large integer numbers may contain too much precision in the lease-significant digits.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Loss of precision - int -> float or double?

Related Questions

Exception when comparing an (int)double and (int)int [closed]?

Are all single-precision numbers representable in the double-precision format?

Objective-C Float / Double precision?

C (C++, Objective-C) typecast from float to int results in decremented int?

F# fails with “Error 4 This expression was expected to have type int but here has type int -> int”?

In case of integer overflows what is the result of (unsigned int) * (int)? unsigned or int?